Advances in the recovery of haplotypes from the metagenome

نویسندگان

  • Samuel M. Nicholls
  • Wayne Aubrey
  • Leander Schietgat
  • Christopher J. Creevey
  • Amanda Clare
چکیده

High-throughput DNA sequencing has enabled us to look beyond consensus reference sequences to the variation observed in sequences within organisms; their haplotypes. Recovery, or assembly of haplotypes has proved computationally difficult and there exist many probabilistic heuristics that attempt to recover the original haplotypes for a single organism of known ploidy. However, existing approaches make simplifications or assumptions that are easily violated when investigating sequence variation within a metagenome. We propose the metahaplome as the set of haplotypes for any particular genomic region of interest within a metagenomic data set and present Hansel and Gretel, a data structure and algorithm that together provide a proof of concept framework for the recovery of true haplotypes from a metagenomic data set. The algorithm performs incremental haplotype recovery, using smoothed Naive Bayes — a simple, efficient and effective method. Hansel and Gretel pose several advantages over existing solutions: the framework is capable of recovering haplotypes from metagenomes, does not require a priori knowledge about the input data, makes no assumptions regarding the distribution of alleles at variant sites, is robust to error, and uses all available evidence from aligned reads, without altering or discarding observed variation. We evaluate our approach using synthetic metahaplomes constructed from sets of real genes and show that up to 99% of SNPs on a haplotype can be correctly recovered from short reads that originate from a metagenomic data set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Beta-Globin Gene Cluster Haplotypes in Iranian Sickle Cell Patients: Relation to Some Hematologic

Background: Sickle cell anemia is relatively common in Khuzestan province located in Southwest Iran. The characteristics of sickle cell disease in Iran are apparently different from other regions some of these characteristics might be related to β-chain haplotypes. The purpose of this study was to determine the frequency of β-chain haplotypes in 50 patients with homozygous sickle cell anemia in...

متن کامل

Population structure and variation in Persian sturgeon (Acipenser percicus ) from the Caspian Sea as determind from mitochondrial DNA sequences of the control region

Mitochondria1 DNA (mtDNA) control region sequences were analyzed to evaluate the population genetic structure of Persian sturgeon (Acipenser persicus) in Caspian Sea. A total of 45 specimens were collected from the different locations of the Caspian Sea. MtDNA control region was amplified using PCR. Direct sequencing was performed according standard method. The results showed that 12 haplotypes...

متن کامل

HAPLOWSER: a whole-genome haplotype browser for personal genome and metagenome

SUMMARY Haplotype assembly is becoming a very important tool in genome sequencing of human and other organisms. Although haplotypes were previously inferred from genome assemblies, there has never been a comparative haplotype browser that depicts a global picture of whole-genome alignments among haplotypes of different organisms. We introduce a whole-genome HAPLotype brOWSER (HAPLOWSER), provid...

متن کامل

Study of mtDNA vatriation of Russian sturgeon population from the south Caspian Sea using RFLP analysis of PCR amplified ND5/6 gene regions

PCR-based mtDNA analysis (RFLP) was used for the study of population differentiation in the Russian sturgeon (Acipenser gueldenstaedti). The mtDNA ND5/6 gene regions were amplified using PCR techniques followed by RFLP analysis. 39 different composite haplotypes were detected among 62 specimens. 29 haplotypes were rare occuring only once in two regions (west and east areas of the Southern Caspi...

متن کامل

Study of mtDNA vatriation of Russian sturgeon population from the south Caspian Sea using RFLP analysis of PCR amplified ND5/6 gene regions

PCR-based mtDNA analysis (RFLP) was used for the study of population differentiation in the Russian sturgeon (Acipenser gueldenstaedti). The mtDNA ND5/6 gene regions were amplified using PCR techniques followed by RFLP analysis. 39 different composite haplotypes were detected among 62 specimens. 29 haplotypes were rare occuring only once in two regions (west and east areas of the Southern Caspi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016